library(jsonlite) |> suppressPackageStartupMessages()
library(tidyverse) |> suppressPackageStartupMessages()
library(maps) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(readxl) |> suppressPackageStartupMessages()
library(spatstat) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()
library(spdep) |> suppressPackageStartupMessages()
library(spatialreg) |> suppressPackageStartupMessages()
library(patchwork) |> suppressPackageStartupMessages()
set.seed(20001015)
cb_palette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")Untitled
Load Libraries
1.Introduction
I am interested in studying the geographical distribution of martyrs’ memorial facilities in China. According to the Regulations on the Protection and Management of Martyrs’ Memorial Facilities, these facilities include martyrs’ cemeteries, martyrs’ graves, martyrs’ columbariums, walls of martyrs’ names, memorial halls, monuments, memorial pavilions, memorial towers, memorial shrines, memorial statues, and memorial squares (none is more familiar with those facilities than a DC citizen). These facilities can be seen as remnants of the Communist Revolution in China during the early 20th century. Their distribution reflects various aspects, such as the number of individuals from a region who sacrificed their lives for the Communist Revolution, the significance of these locations in the revolution (akin to Lexington’s role in the American Revolutionary War), former revolutionary base areas and battle sites, or perhaps simply the local government’s interest in ideological promotion (e.g., if a city has little connection to the Communist Revolution but still hosts a notable number of such facilities).
The data I am using was found in an Excel document on a Chinese econometrics web forum. However, I have not been able to identify the author or any related publication using this dataset. The dataset includes 3,966 observations and appears sufficiently exclusive, even if it does not cover all those facilities. Additionally, I conducted a sampling-based search and cross-validation for some of the facilities, confirming that the dataset provides accurate facility name, latitude and longitude information, except for approximately 100 facilities whose coordinates were incorrectly placed outside China’s borders. Despite its imperfections, this dataset offers a solid starting point for exploring the geographical distribution of martyrs’ memorial facilities and the spatial characteristics of the Chinese Communist Revolution.
2.Visualize Martyrs’ Memorial Facilities
2.1. Loading China’s Map
Although rnaturalearth can provide China map, it’s not accurate enough, especially for city level. Therefore, instead, I choose to use a China map I found online.
Reading layer `100000_full_city' from data source
`https://geo.datav.aliyun.com/areas_v3/bound/100000_full_city.json'
using driver `GeoJSON'
Simple feature collection with 477 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 73.50647 ymin: 3.397162 xmax: 135.0946 ymax: 53.56362
Geodetic CRS: WGS 84
However, this approach also has its challenges. First, the four directly administered municipalities—Beijing, Shanghai, Tianjin, and Chongqing—are subdivided into district-level units (e.g., Beijing is divided into 16 “cities”). However, most statistical data is not available at the district level but treats these four municipalities as single cities. Therefore, the polygons of these four municipalities must first be merged.
###Beijing
beijing <- china %>%
slice(1:16) %>%
mutate(geometry = st_make_valid(geometry)) %>%
summarise(geometry = st_union(geometry)) %>%
mutate(name = "北京市") %>%
select(name,geometry)
china <- china %>%
slice(-1:-16) %>%
bind_rows(beijing)
###Tianjin
tianjin <- china %>%
slice(1:16) %>%
mutate(geometry = st_make_valid(geometry)) %>%
summarise(geometry = st_union(geometry)) %>%
mutate(name = "天津市") %>%
select(name,geometry)
china <- china %>%
slice(-1:-16) %>%
bind_rows(tianjin)
###Shanghai
shanghai <- china %>%
slice(71:86) %>%
mutate(geometry = st_make_valid(geometry)) %>%
summarise(geometry = st_union(geometry)) %>%
mutate(name = "上海市") %>%
select(name,geometry)
china <- china %>%
slice(-71:-86) %>%
bind_rows(shanghai)
###Chongqing
chongqing <- china %>%
slice(250:287) %>%
mutate(geometry = st_make_valid(geometry)) %>%
summarise(geometry = st_union(geometry)) %>%
mutate(name = "重庆市") %>%
select(name,geometry)
china <- china %>%
slice(-250:-287) %>%
bind_rows(chongqing) Another issue is that our analysis does not include Taiwan, Hong Kong, and Macau, as they were not impacted directly by the Chinese Communist Revolution. Therefore, the next step is to remove the polygons for these regions.
china <- china %>% slice(c(-364:-391))
china <- china %>% slice(-233)
china_sf <- sf::st_transform(china, crs = 3857) %>% rename(city = name)
china_map <- mapview(china_sf, legend = FALSE)
china_map2.2. Load Martyrs’ Memorial Facilities
After loading the data, I conducted an initial visualization of the distribution of martyrs’ memorial facilities in China. Due to the large sample size, I randomly selected 500 points and used mapview to display their distribution on a map.
mf <- read_excel("data/Martyrs’ Memorial Facilities.xlsx")
mf_sf <- st_as_sf(mf, coords = c("longitude", "latitude"), crs = 4326) %>%
sf::st_transform(3857)
sample_mf_sf <- slice_sample(mf_sf, n = 500)
mapview(sample_mf_sf) + china_map2.2. Count Martyrs’ Memorial Facilities in Each City
Here, I used st_join to calculate the number of martyrs’ memorial facilities contained within each city.
china_sf <- st_make_valid(china_sf)
mf_sf <- mf_sf %>% select(-city)
joined_sf <- st_join(mf_sf, china_sf, join = st_within)
point_counts <- joined_sf %>%
group_by(city) %>%
summarize(facilities = n())
point_counts <- point_counts %>% st_drop_geometry()
china_sf <- china_sf %>%
left_join(point_counts, by = "city")NA should mean no facilities. Therefoere, replace na with 0.
china_sf$facilities[is.na(china_sf$facilities)] <- 0Then, we can visualize the city-level distribution of martyrs’ memorial facilities.
china_sf |> ggplot(aes(fill=facilities)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
3. Clan Network and China’s Communist Revolution
Clan networks were a crucial soft organizational structure in traditional Chinese society. In the traditional Chinese governance system, the central government’s administrative control extended only to the county level, leaving governance below that level largely reliant on local clan networks, especially the management of prominent gentry. Clan networks took on many functions that modern governments fulfill today, particularly in the provision of public goods. For instance, they mediated disputes, administered justice (providing “fairness”), and facilitated education. Many traditional educational institutions, such as private academies (sishu), were often funded jointly by clan members and used clan-owned properties as venues.
However, in the traditional narratives of the Chinese Communist Party (CCP), especially before the reform and opening-up era, clans were seen as part of the feudal superstructure and thus symbols of conservatism, backwardness, and autocracy. Clan network leaders were often landlords, who were depicted as the primary “counter-revolutionaries” during the Communist Revolution and the targets of land reform and subsequent political campaigns (e.g., landlords were part of the “Five Black Categories”). This negative portrayal stemmed partly from Marxist historical materialism and partly from the practical needs of revolutionary struggle. In CCP narratives, landlord groups were strong supporters of the Chiang Kai-shek regime and, thus, unforgivable class enemies in the battle for power.
This narrative, however, is largely a construction, as local landlords were not closely tied to the Nationalist Party (KMT). In fact, KMT failed to establish solid control at village level. The negative propaganda against landlords was primarily driven by the political mobilization needs of the revolution. Before the Civil War, the CCP faced the challenge of rallying public support to overthrow the KMT government, which still enjoyed high prestige and legitimacy following the War of Resistance against Japan. China had emerged as a victor in World War II and a permanent member of the United Nations Security Council, marking significant advancements in its international standing. To overcome this legitimacy, the CCP launched a series of land reform campaigns in its controlled areas, inciting peasants to denounce landlords and redistributing their land and assets to the people.
The CCP also sought to link hatred of landlords with hostility toward the Chiang regime. One infamous practice was the Wang Jiang Gan (“looking at Chiang pole”). This involved tying landlords to a pole and gradually hoisting them up while asking if they could “see Chiang Kai-shek.” If the landlord answered no, they were hoisted higher; if they answered yes, they will die from dropping. Regardless of the response, the ultimate fate was often death by dropping.
Despite the CCP’s animosity toward landlords and the traditional clan networks they represented, it is surprising to find that many Communist leaders came from landlord families. Mao Zedong himself was born into a landlord family. This might be because early Communist revolutionaries were intellectuals, and literacy and access to higher education were crucial for exposure to and understanding of Communist ideology. Clan networks traditionally emphasized education, and the relationship between clan support and success in the imperial examination system (keju) has been well-documented. Clans often funded public education (sishu) and, in the early 20th century, could financially support young members pursuing higher or even Western education.
Thus, I plan to study the impact of traditional Chinese clan networks on the Communist Revolution. To operationalize this research, I will use the number of genealogies per capita from the Qing Dynasty and earlier to reflect the historical strength of clan networks, as genealogies are a significant measure of clan culture intensity. Additionally, I will use the number of martyrs’ memorial facilities per capita to measure revolutionary culture.
3.1. Load Genealogy Data
Unfortunately, I also found the genealogical data for each prefecture-level city on the same forum, so I have not been able to identify its author either. However, in subsequent visualization results, we can see that areas with high genealogical density align well with common knowledge about regions with strong clan culture. Jiangsu and Zhejiang (the eastern regions of China near Shanghai) have the highest number of genealogies. Therefore, I have chosen to trust the quality of this dataset. The dataset includes the number of genealogies for each city from the Song, Yuan, Ming, and Qing dynasties, as well as the population in 1990.
gen <- read_excel("data/Genealogy.xlsx")
gen <- gen %>%
mutate(across(c(Song, Yuan, Ming, Qing), ~replace(., is.na(.), 0))) %>%
mutate(genealogies = Song + Yuan + Ming + Qing) %>%
select(-Song, -Yuan, -Ming, -Qing)Here, I joined the new dataset with the main martyrs’ memorial facilities dataset, china_sf.
china_sf <- china_sf %>%
left_join(gen, by = "city") china_sf <- china_sf %>% select(city, province, pop, facilities, genealogies, geometry)As mentioned above, the map shows that Jiangsu and Zhejiang have the strongest clan culture intensity, which aligns with our common knowledge.
china_sf |> ggplot(aes(fill=genealogies)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
For the missing values in the genealogical data, we can observe from the map that these are concentrated in non-Han areas in western China. Since clan networks are a phenomenon specific to Han culture, particularly prevalent in southeastern China, it is reasonable to fill the missing values with 0.
china_sf$genealogies[is.na(china_sf$genealogies)] <- 03.2. Population Data Manipulation
Since I plan to “standardize” the independent and dependent variables using per capita values, population is also an important variable. Therefore, I first visualized the population distribution in China in 1990.
china_sf |> ggplot(aes(fill=pop)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
As seen in the figure, there are many missing values in the population data, and it is not feasible to fill them with 0. Therefore, I plan to fill the missing values with the average population of the five nearest neighbors.
china_sf$centroid <- st_centroid(china_sf$geometry)
china_nb <- knn2nb(knearneigh(china_sf$centroid, k = 5))
china_listw <- nb2listw(china_nb, style = "W")
china_sf$pop_mean_neighbors <- sapply(1:nrow(china_sf), function(i) {
neighbors <- china_nb[[i]]
neighbor_pops <- china_sf$pop[neighbors]
mean_pop <- mean(neighbor_pops, na.rm = TRUE)
return(mean_pop)
})
china_sf$pop <- ifelse(is.na(china_sf$pop), china_sf$pop_mean_neighbors, china_sf$pop)
china_sf |> ggplot(aes(fill=pop)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
As shown in the figure, all the missing population values have now been filled, and the filled areas appear very smooth without producing outliers that differ significantly from the surrounding cities.
Now, here is no missing values in china_sf
china_sf <- china_sf %>% select(-centroid, -pop_mean_neighbors)Then, we can use population to “standardize” two major variables.
3.3. Preliminary Regression
However, in the subsequent regression, I attempted to use the log form of the two variables, but taking the log of 0 would result in a large number of missing values. Therefore, I first added 1 to both main variables. This approach minimizes estimation bias while significantly preserving the sample size.
china_sf <- china_sf %>%
mutate(facap = 100*(1+facilities)/pop) %>%
mutate(gecap = 100*(1+genealogies)/pop)Here, I attempt to gain an initial intuition about the relationship between the two variables through a scatter plot.
ggplot(china_sf, aes(x = gecap, y = facap)) +
geom_point() +
labs(x = "Genealogies", y = "Facilities", title = "Scatter plot of Facilities vs Genealogies") +
theme_minimal() 
This plot looks strange due to the presence of a severe outlier. Therefore, the next step is to identify and remove this outlier. The method I chose is to replace the outlier’s two variables with their respective medians.
facap_above_3000 <- china_sf %>% filter(facap > 3000)
facap_above_3000Simple feature collection with 1 feature and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 8726952 ymin: 3462402 xmax: 9610049 ymax: 4281466
Projected CRS: WGS 84 / Pseudo-Mercator
city province pop facilities genealogies
1 阿里地区 西藏自治区 0.000613 7 0
geometry facap gecap
1 POLYGON ((9581332 4281466, ... 1305057 163132.1
china_sf <- china_sf %>%
mutate(
facap = ifelse(facap > 3000, median(facap), facap),
gecap = ifelse(gecap > 3000, median(gecap), gecap)
)ggplot(china_sf, aes(x = gecap, y = facap)) +
geom_point() +
labs(x = "Genealogies", y = "Facilities", title = "Scatter plot of Facilities vs Genealogies") +
theme_minimal() 
Now, we can finally observe the relationship between the two variables. However, there does not seem to be a clear linear relationship between them. Therefore, I plan to take the log of both variables to examine any potential non-linear relationship between them.
china_sf <- china_sf %>%
mutate(lfacap = log(facap)) %>%
mutate(lgecap = log(gecap))ggplot(china_sf, aes(x = lgecap, y = lfacap)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2), color = "blue", se = FALSE) +
labs(x = "Genealogies", y = "Facilities", title = "Scatter plot of Facilities vs Genealogies") +
theme_minimal()
It is clear that there is a significant inverted U-shaped relationship between the two variables. This is actually not surprising, as I previously discussed the relationship between clan culture and education, and the importance of education in accessing communist ideology. Therefore, when clan culture is weak, increasing clan culture can improve educational opportunities, which in turn increases the likelihood of individuals being exposed to communist ideas, leading to a positive correlation between the two variables. However, when clan power is strong, extreme ideologies like communism are suppressed, especially because clan culture often carries strong conservative tendencies.
Therefore, I hypothesize: H1: The clan culture has a negative relationship with red culture/China’ communist revolution.
china_sf <- china_sf %>%
mutate(lgecap2 = lgecap^2)Next, to test the significance of this inverted U-shaped relationship, I performed a simple regression that includes the squared term of the independent variable.
ols_fit <- lm(lfacap ~ lgecap + lgecap2, data=china_sf)
result_df <- broom::tidy(ols_fit)
result_df$p.value <- formatC(result_df$p.value, format = "f", digits = 3)
result_df# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <chr>
1 (Intercept) 1.30 0.0606 21.5 0.000
2 lgecap 0.266 0.0611 4.35 0.000
3 lgecap2 -0.0559 0.0143 -3.90 0.000
The results show that both the independent variable and its squared term are significant at the 0.001 level. Therefore, H1 is supported.
Here, I visualized the two logged variables, so we can see they relationship graphically.
china_lgecap <- china_sf |> ggplot(aes(fill=lgecap)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
china_lfacap <- china_sf |> ggplot(aes(fill=lfacap)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
china_lgecap+china_lfacap
Besides, to look into the relationship between two variables more closely, I try to zoom in southeast China, where is the origon of China’s communist revolution.
china_sf$province[china_sf$city == "襄阳市"] <- "湖北省"
southeast_china_sf <- china_sf %>% filter(province %in% c("江西省", "福建省", "广东省", "安徽省", "浙江省", "江苏省", "湖南省", "湖北省", "上海市"))southeast_lgecap <- southeast_china_sf |> ggplot(aes(fill=lgecap)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
southeast_lfacap <- southeast_china_sf |> ggplot(aes(fill=lfacap)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
southeast_lgecap+southeast_lfacap
It’s clear that clan networks are well-developed throughout southeastern China, especially in the Zhejiang region in the eastern part. On the other hand, martyr memorial facilities are highly concentrated in Jiangxi Province (the central part of the map), while their density is much lower in the western province of Hunan and the northeastern provinces of Jiangsu and Zhejiang. Interestingly, there is a distinct contrast between Jiangxi and Hunan, a phenomenon that is further discussed below. But overall, we can see a negative relationship between clan culture and red culture in Jiangsu and Zhejiang, where exist the strongest clan culture, supporting H1 further.
3.4. Spatial Regression
china_sf$resid <- ols_fit$residuals
china_sf |> ggplot(aes(x=lgecap, y=resid)) +
geom_point() +
geom_hline(yintercept=0, color=cb_palette[1], linewidth=1) +
theme_classic()
However, the visualization results also raise concerns about spatial autocorrelation. We found that both clan culture and revolutionary culture exhibit strong clustering. Therefore, it is necessary to test whether spatial autocorrelation exists in the regression above.
i_resid <- moran.test(china_sf$resid, china_listw)
i_resid
Moran I test under randomisation
data: china_sf$resid
weights: china_listw
Moran I statistic standard deviate = 12.164, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.3763306541 -0.0027397260 0.0009712037
Moran’s I of residual is 0.376, which is not too strong, but still need to be handled. I run a spatial regression to reduce the bias caused by spatial autocorrelation.
ldv_fit <- lagsarlm(lfacap ~ lgecap + lgecap2, data=china_sf, listw=china_listw)
ldv_result_df <- broom::tidy(ldv_fit)
ldv_result_df$p.value <- formatC(ldv_result_df$p.value, format = "f", digits = 3)
ldv_result_df# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <chr>
1 rho 0.584 0.0532 11.0 0.000
2 (Intercept) 0.522 0.0890 5.87 0.000
3 lgecap 0.173 0.0518 3.35 0.001
4 lgecap2 -0.0309 0.0121 -2.56 0.010
The independent variable and its squared term remain significant, but their significance has decreased because part of the data’s information is used to estimate the impact of spatial autocorrelation. Meanwhile, rho is significant at the 0.001 level, indicating that spatial autocorrelation does indeed exist.
china_sf$sp_resid <- ldv_fit$resid
china_sf |> ggplot(aes(x=lgecap, y=sp_resid)) +
geom_point() +
geom_hline(yintercept=0, color=cb_palette[1], linewidth=1) +
theme_classic()
i_resid <- moran.test(china_sf$sp_resid, china_listw)
i_resid
Moran I test under randomisation
data: china_sf$sp_resid
weights: china_listw
Moran I statistic standard deviate = -0.59839, p-value = 0.7252
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
-0.0213668594 -0.0027397260 0.0009689886
After controlling for spatial autocorrelation, the Moran’s I of the residuals decreased to -0.02, which is very close to 0, indicating that spatial autocorrelation has been effectively addressed.
3.5. Equilibrium Effects
Given the spatial regression coefficient \(\rho\), and the non-spatial regression coefficient vector \(\boldsymbol\beta = (\beta_0, \beta_1, \beta_2)\), the long-run equilibrium value for \(Y\) is given by
\[ Y^* = (I - \rho\mathbf{W})^{-1}(X\boldsymbol\beta), \]
Using the formula above, we can calculate what happens to the number of martyr memorial facilities in Ganzhou, the capital of the Chinese Soviet Republic, and its surrounding areas when the clan network power in the region doubles. To focus this study on the Chinese Communist Revolution, I limit the research scope to southeastern China. This is because many memorial facilities in the north commemorate martyrs from the War of Resistance Against Japan, which are less directly related to the Communist Revolution. Southeastern China, however, was the birthplace of the early armed revolution of the Chinese Communist Party.
china_sf <- china_sf %>%
mutate(southeast = if_else(province %in% c("江西省", "福建省", "广东省", "安徽省", "浙江省", "江苏省", "湖南省", "湖北省", "上海市"), 1, 0))target_city <- "赣州市"
# Create a dummy variable in it_sf which is 1 for this target collegio
china_sf <- china_sf |> mutate(
is_target = factor(city == target_city)
)
# Use sf to create a single geometry which is just a circle around the centroid
# of the target collegio. Since 50km was our cutoff for being considered a
# neighbor, we use 50km here for the radius of the circle
circ_radius <- 200000
circ_sf <- china_sf |> filter(is_target == TRUE) |> sf::st_centroid() |>
sf::st_buffer(circ_radius) |> sf::st_boundary()Warning: st_centroid assumes attributes are constant over geometries
china_sf %>% filter(southeast == 1) %>% ggplot() +
geom_sf(aes(fill=is_target)) +
geom_sf(data=circ_sf, linetype="dashed") +
scale_fill_manual(values=c(cb_palette[1], cb_palette[2])) +
theme_classic()
This diagram is a conceptual illustration, showing the target city where the independent variable is doubled in our “laboratory” and the potential spillover effects of this change on surrounding cities.
which(china_sf$is_target == TRUE)[1] 126
Here, I created a counterfactual variable for the number of genealogies per capita. The only difference between this counterfactual variable and the original variable is that the value in row 126, corresponding to Ganzhou City, is doubled from the original value.
X_obs <- cbind(1, china_sf$lgecap, china_sf$lgecap2)
china_sf$gecap_cf <- china_sf$gecap
china_sf$gecap_cf[126] <- china_sf$gecap_cf[126] * 2
X_cf <- cbind(1, log(china_sf$gecap_cf), log(china_sf$gecap_cf)^2)
which(apply(X_obs != X_cf, 1, any))[1] 126
Here, we can use the formula to calculate the difference between the equilibrium values of the dependent variable derived from the original independent variable and the counterfactual variable.
compute_y_eq <- function(X, beta, rho, nb_obj) {
# Create an identity matrix
I <- diag(nrow(X))
W <- nb2mat(nb_obj, style="W")
y_eq <- solve(I - rho * W) %*% (X %*% beta)
return(y_eq)
}
estimates <- ldv_result_df$estimate
beta <- c(estimates[2], estimates[3], estimates[4])
rho <- estimates[1]
lfacap_eq <- compute_y_eq(X_obs, beta, rho, china_nb)
lfacap_cf <- compute_y_eq(X_cf, beta, rho, china_nb)
china_sf$lfacap_diff <- as.vector(lfacap_cf - lfacap_eq)We can see that the differences are less than 0. Therefore, I visualized the regions where the size of the change in the dependent variable is greater than 0.001.
china_sf |> filter(southeast == 1) |>
mutate(lfacap_diff = ifelse(lfacap_diff > -0.001, NA, lfacap_diff)) |>
ggplot(aes(fill=lfacap_diff)) +
geom_sf() +
scale_fill_viridis_c(option="C") +
theme_classic()
As shown in the figure, doubling clan culture leads to a decrease in the logged number of per capita martyr memorial facilities. Moreover, this negative effect exhibits a spillover effect that gradually diminishes in surrounding cities.
4. The Distribution of Martyrs’ Memorial Facilities
4.1. Visualize The Density of Martyrs’ Memorial Facilities
# generate the window
southeast_china_sfc <- southeast_china_sf %>%
select(geometry) %>%
st_buffer(300) %>%
st_union()
mapview(southeast_china_sfc)southeast_mf_sf <- mf_sf[st_within(mf_sf, southeast_china_sfc, sparse = FALSE), ]
mf_ppp <- as.ppp(sf::st_as_sfc(southeast_mf_sf), W=as.owin(southeast_china_sfc))Warning: data contain duplicated points
mf_density <- density(mf_ppp, sigma=50000)
plot(mf_density)
4.2. Visualize The Population Density
To depict the population density, we first need to sample point data based on the proportion of each city’s population relative to the entire region. Here, I choose to sample 3000 points.
nsplit = function(X,n){
p = X/sum(X)
diff(round(n*cumsum(c(0,p))))
}
southeast_china_sf$sample <- nsplit(southeast_china_sf$pop, 3000)
southeast_china_sf <- southeast_china_sf %>%
mutate(sample = if_else(sample == 0, 1, sample))
sampled_points <- st_sample(southeast_china_sf, size = southeast_china_sf$sample, type = 'random')
ggplot() +
geom_sf(data = southeast_china_sf, fill = "lightblue", color = "black", alpha = 0.5) +
geom_sf(data = sampled_points, color = "red", size = 0.5) +
theme_minimal() +
labs(title = "Randomly Sampled Points by Population Weight")
Now, we can construct the population intensity function.
pop_ppp <- as.ppp(sampled_points, W=as.owin(southeast_china_sfc))
pop_density <- density(pop_ppp)
plot(pop_density)
4.3. Revolution and Population
Theoretically, communist revolutions are more likely to develop in areas with low population density, as such regions are often less suitable for human settlement due to rugged terrain. However, rugged terrain can benefit revolutionaries by providing hiding spots and opportunities to use the landscape to conduct guerrilla warfare against government forces, especially when they are outnumbered and outgunned. Therefore, I propose the following hypothesis:
H2: Areas with lower population density will exhibit a higher density of martyrs’ memorial facilities.
num_regions <- 3
region_labels <- c("Low", "Medium", "High")
pop_vals <- pop_density
pop_quant <- quantile(pop_vals, probs=(0:num_regions)/num_regions, na.rm=TRUE)
pop_cut <- cut(pop_vals, breaks=pop_quant, labels=region_labels)
pop_areas <- tess(image=pop_cut)
plot(pop_areas)
obs_mf_counts <- quadratcount(mf_ppp, tess=pop_areas) |> as.vector()
names(obs_mf_counts) <- region_labels
obs_mf_counts Low Medium High
533 519 502
After dividing the region into high, medium, and low population density areas, the next step involves using Monte Carlo Simulation to simulate the distribution of martyrs’ memorial facilities entirely based on population density 999 times. I then plotted the probability distribution of the number of martyrs’ memorial facilities in low-population-density areas from the simulations and compared it to the actual observed value.
compute_quadrat_counts <- function(sim_ppp) {
sim_counts <- quadratcount(sim_ppp, tess=pop_areas) |> as.vector()
names(sim_counts) <- region_labels
return(sim_counts)
}
gen_sims_ppp <- function(num_sims) {
prot_sims <- spatstat.random::rpoint(
n = nrow(southeast_mf_sf),
f = pop_density,
nsim = num_sims
)
return(prot_sims)
}
full_sims_list <- gen_sims_ppp(num_sims = 999)
full_sim_area_counts <- lapply(X=full_sims_list, FUN=compute_quadrat_counts)
full_count_df <- as_tibble(full_sim_area_counts) |> t() |> as_tibble()Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
`.name_repair` is omitted as of tibble 2.0.0.
ℹ Using compatibility `.name_repair`.
colnames(full_count_df) <- region_labels
full_count_df |> head()# A tibble: 6 × 3
Low Medium High
<int> <int> <int>
1 398 472 684
2 428 443 683
3 360 483 711
4 375 495 684
5 401 493 660
6 406 505 643
mc_df <- bind_rows(full_count_df, obs_mf_counts)
full_count_df |> ggplot(aes(x=Low)) +
geom_density(fill=cb_palette[2], alpha=0.5) +
geom_vline(xintercept = obs_mf_counts["Low"], linetype="dashed", color=cb_palette[1]) +
theme_classic()
As shown in the figure, the actual number of martyrs’ memorial facilities in low-population-density areas is 533, while the simulated values are concentrated around 390. Therefore, if the null hypothesis holds—i.e., martyrs’ memorial facilities are distributed according to population density—the observed value is extremely unlikely to be observed, as it lies at the far right tail of the probability distribution. This indicates that martyrs’ memorial facilities are significantly concentrated in low-population-density areas. H2 is supported.
q4_more_extreme_df <- mc_df[mc_df$Low >= obs_mf_counts["Low"],]
q4_prop_more_extreme <- nrow(q4_more_extreme_df) / nrow(mc_df)
q4_prop_more_extreme[1] 0.001
Similarly, we can calculate the proportion of values as extreme as or more extreme than the observed value relative to the total sample (1000). The result is 0.001, indicating that in 999 simulations, no values as extreme as or more extreme than the observed value were found. This provides stronger support for H2.
4.4. Distance to Capital and Revolution
The distance from the capital reflects the cost for the government to effectively exercise administrative control and suppress the revolution in that area. The farther a location is from the capital, the higher the cost for the government to suppress the revolution. On the other hand, the closer a location is to the capital, the harder it is for revolutionary parties to spread their ideology, develop party members, organize, and survive. Therefore, I use the distance from the Mansion of the President in Nanjing to measure the distance from the capital and propose the hypothesis:
H3: Martyrs’ memorial facilities tend to be distributed in areas farther from the capital.
mp_coords <- c(32.04484, 118.79236)
mp_df <- tibble(name=c("Mansion of the President of Nanjing"), lat=mp_coords[1], lon=mp_coords[2])
mp_sf <- sf::st_as_sf(
mp_df,
coords=c("lon", "lat"),
crs=4326
) |> sf::st_transform(3857)
mapview(mp_sf)Next, I will compare the actual average distance from the capital to the martyrs’ memorial facilities with the average distance from the capital obtained from the previous Monte Carlo simulation results in the same way.
all_distances <- southeast_mf_sf |> sf::st_distance(mp_sf)
obs_mean_dist <- as.numeric(mean(all_distances))
obs_mean_dist[1] 763375.8
The actual average distance from the capital to the martyrs’ memorial facilities is 763375.8 m.
compute_mean_dist <- function(sim_ppp) {
sim_prot_sf <- sim_ppp |> sf::st_as_sf() |> sf::st_set_crs(3857) |> filter(label == "point")
sim_dists <- sim_prot_sf |> sf::st_distance(mp_sf)
return(mean(as.numeric(sim_dists)))
}
sim_dists <- lapply(X=full_sims_list, FUN=compute_mean_dist)
sim_dists |> head()$`Simulation 1`
[1] 648980.6
$`Simulation 2`
[1] 638051.6
$`Simulation 3`
[1] 646935.5
$`Simulation 4`
[1] 662433.3
$`Simulation 5`
[1] 654923.4
$`Simulation 6`
[1] 672673.5
sim_dist_df <- unlist(sim_dists) |> as_tibble()
obs_dist_df <- tibble(value = obs_mean_dist)
mc_dist_df <- bind_rows(sim_dist_df, obs_dist_df)
mc_dist_df |> ggplot(aes(x=value)) +
geom_density(fill=cb_palette[2], alpha=0.5) +
geom_vline(xintercept = obs_mean_dist, linetype="dashed", color=cb_palette[1]) +
theme_classic()
q5_more_extreme_df <- mc_dist_df[mc_dist_df$value >= obs_mean_dist,]
q5_more_extreme_prop <- nrow(q5_more_extreme_df) / nrow(mc_dist_df)
q5_more_extreme_prop[1] 0.001
Both graph and p-value support H3, martyrs’ memorial facilities tend to be distributed in areas farther from the capital.
4.5. Province Boundary and Revolutionary Strategy
Jiangxi Province served as the base of the Chinese Soviet Republic and the cradle of the CCP’s armed revolution. Observing the boundaries of Jiangxi Province, I discovered an intriguing phenomenon. Along the border between Jiangxi and Hunan Province (to the west of Jiangxi), the number of martyrs’ memorial facilities on the Hunan side of the border is sparse and highly dispersed, particularly near the boundary. In contrast, on the Jiangxi side, the distribution of martyrs’ memorial facilities is much denser, especially near the border. This observation might provide insights into the strategic considerations of the Chinese Communist Revolution.
In Mao Zedong’s 1928 essay “Why is it that Red Political Power can Exist in China?” he elaborates on the survival strategy of Red political power:
“The long-term survival inside a country of one or more small areas under Red political power completely encircled by a White regime is a phenomenon that has never occurred anywhere else in the world. There are special reasons for this unusual phenomenon. It can exist and develop only under certain conditions. … For this unusual phenomenon can occur only in conjunction with another unusual phenomenon, namely, war within the White regime. It is a feature of semicolonial China that, since the first year of the Republic [1912] the various cliques of old and new warlords have waged incessant wars against one another, supported by imperialism from abroad and by the comprador and landlord classes at home. … The prolonged splits and wars within the White regime provide a condition for the emergence and persistence of one or more small Red areas under the leadership of the Communist Party amidst the encirclement of the White regime. The independent regime carved out on the borders of Hunan and Jiangxi Provinces is one of many such small areas.”
Therefore, I plan to examine the relationship between the distribution of martyrs’ memorial facilities and their distance from the Jiangxi Province border. To begin, I visualized the boundary of Jiangxi Province alongside the martyrs’ memorial facilities.
jiangxi_sf <- southeast_china_sf %>% filter(province == "江西省")jiangxi_bd <- jiangxi_sf %>% st_union %>% st_boundary() %>% st_as_sf()
mapview(southeast_mf_sf) + mapview(jiangxi_bd)Next, I calculated the distance of all martyrs’ memorial facilities from the Jiangxi Province border, recording distances within the province as negative values. Finally, I plotted the density distribution of these distances.
jiangxi_polygon <- st_cast(jiangxi_bd, "POLYGON")
in_sf <- southeast_mf_sf %>%
filter(st_intersects(southeast_mf_sf, jiangxi_polygon, sparse = FALSE))Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
ℹ Please use one dimensional logical vectors instead.
out_sf <- southeast_mf_sf %>%
filter(!st_intersects(southeast_mf_sf, jiangxi_polygon, sparse = FALSE))in_sf$distance <- in_sf %>% st_distance(jiangxi_bd) %>% as.vector
in_sf <- in_sf %>% mutate(distance = -distance)
out_sf$distance <- out_sf %>% st_distance(jiangxi_bd) %>% as.vector
distance = rbind(in_sf, out_sf)ggplot(distance, aes(x = distance)) +
geom_density(fill = cb_palette[2], alpha = 0.5) +
labs(
title = "Density Plot of Distance",
x = "Distance",
y = "Density"
) +
geom_vline(xintercept = 0, linetype="dashed", color=cb_palette[1]) +
theme_classic()
As shown in the figure, martyrs’ memorial facilities are densely concentrated near the inner side of Jiangxi Province’s border (the region to the left of x = 0). Within Jiangxi Province, density decreases the farther from the border. A similar pattern is observed outside Jiangxi, but around 200 km from the border, a new peak emerges, indicating the presence of other “red regimes” apart from the Central Soviet Area.
From this, we can draw two preliminary conclusions. First, China’s communist revolution can be characterized as a “border revolution,” with red regimes predominantly established at the intersection of political boundaries. Second, it is a “dispersed revolution,” where apart from the Central Soviet Area, other red regimes emerged at an “optimal” distance from Jiangxi’s border. This spacing minimized the risk of suppression while remaining close enough to facilitate communication and mutual support among the bases.
distance_filtered <- distance %>% filter(distance < abs(min(distance)))
ggplot(distance_filtered, aes(x = distance)) +
geom_density(fill = cb_palette[2], alpha = 0.5) +
labs(
title = "Density Plot of Distance",
x = "Distance",
y = "Density"
) +
geom_vline(xintercept = 0, linetype="dashed", color=cb_palette[1]) +
theme_classic()
When zooming in on the observation window to ensure that the maximum distances inside and outside the provincial border are equal, the above patterns remain consistent.
Next, I plan to use st_buffer in increments of 5,000 meters to calculate the number of martyrs’ memorial facilities within various distance ranges inside and outside Jiangxi Province’s border (e.g., 0–5,000 meters, 5,000–10,000 meters, and so on). This approach will allow us to examine the distribution patterns of China’s red regimes more closely.
generate_buffer_counts <- function(times) {
distance <- 5000 * times
jiangxi_buffer <- jiangxi_bd %>% st_buffer(dist = distance)
in_buffer <- sum(st_intersects(in_sf, jiangxi_buffer, sparse = FALSE))
out_buffer <- sum(st_intersects(out_sf, jiangxi_buffer, sparse = FALSE))
df <- data.frame(
distance = c(-distance, distance),
counts = c(in_buffer, out_buffer))
return(df)
}buffer_counts <- generate_buffer_counts(1)
buffer_counts distance counts
1 -5000 18
2 5000 5
generate_between_buffer_counts <- function(times) {
big_distance <- 5000 * times
small_distance <- 5000 * (times-1)
big_buffer <- jiangxi_bd %>% st_buffer(dist = big_distance)
small_buffer <- jiangxi_bd %>% st_buffer(dist = small_distance)
buffer_difference <- st_difference(big_buffer, small_buffer)
in_buffer <- sum(st_intersects(in_sf, buffer_difference, sparse = FALSE))
out_buffer <- sum(st_intersects(out_sf, buffer_difference, sparse = FALSE))
df <- data.frame(
distance = c(-big_distance, big_distance),
counts = c(in_buffer, out_buffer))
return(df)
}generate_between_buffer_counts(2) distance counts
1 -10000 50
2 10000 7
for (times in 2:10) {
# Get the dataframe for this value of times
temp_df <- generate_between_buffer_counts(times)
# Combine it with the existing buffer_counts dataframe
buffer_counts <- rbind(buffer_counts, temp_df)
}
buffer_counts <<- buffer_counts
buffer_counts distance counts
1 -5000 18
2 5000 5
3 -10000 50
4 10000 7
5 -15000 27
6 15000 10
7 -20000 29
8 20000 9
9 -25000 15
10 25000 7
11 -30000 25
12 30000 11
13 -35000 18
14 35000 14
15 -40000 21
16 40000 10
17 -45000 17
18 45000 6
19 -50000 10
20 50000 9
ggplot(buffer_counts, aes(x = distance, y = counts)) +
geom_line(color = cb_palette[2], size = 1) +
geom_point(color = cb_palette[1], size = 2) + # Add points to highlight values
labs(
title = "Line Plot of Counts by Distance",
x = "Distance",
y = "Counts"
) +
theme_minimal() +
geom_vline(xintercept = 0, linetype="dashed", color=cb_palette[1]) +
theme(
plot.title = element_text(hjust = 0.5) # Center the title
)Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

As shown in the figure, there are 18 facilities within 0–5,000 meters inside the provincial border, while the number reaches an astonishing 50 in the 5,000–10,000 meter range. However, outside the provincial border, the numbers are only 5 and 7 in these respective distance ranges. This stark contrast between inside and outside the border provides insights into the strategic considerations of the Chinese Communist revolution. The red regimes were primarily concentrated 5,000–10,000 meters inside the provincial border. This distance likely created a buffer zone from external provinces, protecting the regime from direct attacks by warlords outside Jiangxi Province. At the same time, the distance was short enough to allow the regime to retreat quickly to neighboring provinces when under attack from warlords within Jiangxi Province.
To some extent, the “free-rider problem” and “tragedy of the commons” among warlords in different provinces created a survival space for the Chinese Communist regime. While the optimal solution for the warlords would have been to unite and eliminate the red regime, once the Communists were driven out of their territory, warlords often stopped pursuing them further, as the conflict then shifted to another warlord’s domain. Moreover, warlords were generally reluctant to expend their own military resources or hoped neighboring warlords would take on the task of eliminating the Communist regime. This “free-rider” behavior ultimately resulted in no single warlord committing fully to suppression efforts, allowing the red regime to survive.